A Clustering-based Approach for Supporting Document-Category Integration

نویسندگان

  • Chih-Ping Wei
  • Tsang-Hsiang Cheng
چکیده

Integration of relevant categorized documents into existent categories of an organization or individual is an important issue in the e-commerce era. Existing categorization-based approach for document-category integration (specifically, ENB) incurs several limitations, including homogeneous assumption on categorization schemes used by master and source catalogs and requirement for a large-sized master categories as training data. In this study, we developed a Clustering-based Category Integration (CCI) technique to address the problems inherent to categorization-based approach. Using the ENB as benchmarks, the empirical evaluation results showed that CCI appeared to improve the effectiveness of document-category integration accuracy in different integration scenarios and seemed to be less sensitive to the size of master categories than ENB.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supporting Document-Category Management: An Ontology-based Document Clustering Approach

Automated document-category management, particularly the document clustering, represents an appealing alternative of supporting a user’s search, access, and utilization of the ever-increasing corpora of textual. Traditional document clustering techniques generally emphasize on the analysis of document contents and measure document similarity on the basis of the overlap between or among the feat...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Preserving User Preferences in Document-Category Management: An Ontology-based Evolution Approach

Preserving the user’s preference in document-category management is essential because it affects his/her search efficiency, cognitive processing load, and satisfaction. Prior research has investigated automated document category evolution by using lexicon-based documentcategory evolution techniques which take into account the document categories previously created by the user. However, comparin...

متن کامل

An Evolution-based Approach to Preserving User Preferences in Document-Category Management

Document clustering is critical to automated document management, hereby a set of documents are clustered in multiple categories, each containing similar or relevant documents. Most previous research assumes time invariability of document category; i.e., not evolving over time after creation. The adequacy of an existing category understandably may diminish as it includes influxes of new documen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003